Skip to content

BUG: Series.to_dict does not return native Python types #37648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Feb 19, 2021

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Nov 5, 2020

This resolves the issue of return types from to_dict. #25969 also discusses return types from .items(), which relates to an outstanding NumPy issue numpy/numpy#14139, and I don't address that part here atm

@arw2019
Copy link
Member Author

arw2019 commented Nov 6, 2020

cc @jreback this is the follow-on to #37571

@pytest.mark.parametrize(
"data,dtype",
(
[np.int64(9), int],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add unsigned int as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Nov 8, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, ,one comment.

@jreback
Copy link
Contributor

jreback commented Nov 10, 2020

and pls merge master

@arw2019
Copy link
Member Author

arw2019 commented Nov 10, 2020

Addressed comment + CI green modulo unrelated (a plotting test on each failing build)

elif is_integer_dtype(value):
with suppress(ValueError, TypeError):
value = int(value)
elif is_bool_dtype(value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use is_float/is_integer/is_bool checks instead of the is_foo_dtype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jbrockmendel
Copy link
Member

comment on type/dtype checking, otherwise LGTM

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment ping on green-ish

(Timestamp("2005-02-25"), Timestamp),
(np.timedelta64(1, "D"), Timedelta),
(Timedelta(1, "D"), Timedelta),
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add other types which don't have an analog (e.g. Period / Interval, bytes) and so return themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@arw2019
Copy link
Member Author

arw2019 commented Feb 17, 2021

@jreback there's a deprecation warning which turns into an error with np_dev

====================================================================== warnings summary ======================================================================
pandas/tests/io/json/test_json_table_schema.py::TestTableOrientReader::test_comprehensive
pandas/tests/io/json/test_json_table_schema.py::TestTableOrientReader::test_comprehensive
pandas/tests/io/json/test_json_table_schema.py::TestTableOrientReader::test_comprehensive
pandas/tests/io/json/test_json_table_schema.py::TestTableOrientReader::test_comprehensive
  /Users/andrewwieteska/repos/pandas/pandas/io/json/_json.py:177: DeprecationWarning: an integer is required (got type float).  Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.
    return dumps(

I'm positive it's caused by this PR - I don't get the warning locally on master but I do on this PR. I also think it's not a straightforward problem to debug and would prefer to address in a follow-on PR. Is that ok?

Skipping the test for now with numpy dev and will open an issue to track

@jreback
Copy link
Contributor

jreback commented Feb 17, 2021

this handled by #39854

don't skip the test just leave

@jreback
Copy link
Contributor

jreback commented Feb 17, 2021

actually u already skipped so ok

@arw2019
Copy link
Member Author

arw2019 commented Feb 18, 2021

should get to green now

@arw2019
Copy link
Member Author

arw2019 commented Feb 18, 2021

@jreback ready to go i believe

@jreback jreback merged commit 23d8c1c into pandas-dev:master Feb 19, 2021
@jreback
Copy link
Contributor

jreback commented Feb 19, 2021

thanks @arw2019

@simonvanderveldt
Copy link

@arw2019 Question about this, was this also supposed to fix/return datetime instead of Panda's Timestamp? I'm still getting Pandas Timestamp objects out of to_dict() it seems.
Wondering if this is a bug or if this was intended.

>>> import pandas as pd
>>> import numpy as np

>>> pd.__version__
'1.5.3'

>>> date_rng = pd.date_range(start='1/1/2020', end='1/10/2020', freq='D')
>>> data = {'date': date_rng, 'data1': np.random.randint(0,100,size=(len(date_rng))), 'data2': np.random.randint(0,100,size=(len(date_rng)))}
>>> df = pd.DataFrame(data)
>>> df.to_dict(orient='records')
[{'date': Timestamp('2020-01-01 00:00:00'), 'data1': 15, 'data2': 29}, {'date': Timestamp('2020-01-02 00:00:00'), 'data1': 69, 'data2': 79}, {'date': Timestamp('2020-01-03 00:00:00'), 'data1': 5, 'data2': 35}, {'date': Timestamp('2020-01-04 00:00:00'), 'data1': 77, 'data2': 7}, {'date': Timestamp('2020-01-05 00:00:00'), 'data1': 65, 'data2': 8}, {'date': Timestamp('2020-01-06 00:00:00'), 'data1': 98, 'data2': 79}, {'date': Timestamp('2020-01-07 00:00:00'), 'data1': 84, 'data2': 49}, {'date': Timestamp('2020-01-08 00:00:00'), 'data1': 71, 'data2': 58}, {'date': Timestamp('2020-01-09 00:00:00'), 'data1': 85, 'data2': 10}, {'date': Timestamp('2020-01-10 00:00:00'), 'data1': 73, 'data2': 10}]

@jbrockmendel
Copy link
Member

Timestamp can represent points in time that pydatetime cannot. im pretty sure this is intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants